Jump to content

VPU Enabling and Usage

From RidgeRun Developer Wiki


Follow us on: YouTube Twitter LinkedIn Email Share this page

Share This Page




Problems running the pipelines shown on this page? Please see our GStreamer Debugging guide for help .

Introduction

The VPU, or Video Processing Unit, is the hardware block responsible for accelerating video codec operations. Instead of encoding or decoding video with the CPU, the VPU performs these operations in dedicated hardware, and reduces CPU load.

Encoder controls

The VPU encoder exposes additional V4L2 controls that can be used to modify the behavior of the hardware encoder. These controls are useful for configuring bitrate, GOP structure, latency, slice settings, and stream formatting.

To see the available encoder controls, run:

v4l2-ctl -d /dev/video33 --list-ctrls
Control Description
video_bitrate Sets the target encoder bitrate.
video_peak_bitrate Sets the maximum peak bitrate.
video_bitrate_mode Selects the bitrate mode, such as variable bitrate.
video_gop_size Sets the GOP size, or distance between keyframes.
video_gop_closure Enables closed GOP behavior.
video_b_frames Enables or disables B-frames.
force_key_frame Forces the encoder to generate a keyframe.
lowlatency_mode Enables low-latency encoding mode.
frame_level_rate_control_enable Enables frame-level rate control.
slice_partitioning_method Selects the slice partitioning method.
maximum_bytes_in_a_slice Sets the maximum number of bytes per slice.
number_of_mbs_in_a_slice Sets the number of macroblocks per slice.
intra_refresh_period Sets the intra-refresh period.
prepend_sps_and_pps_to_idr Prepends SPS/PPS headers to IDR frames.
generate_access_unit_delimiters Adds access unit delimiters to the encoded stream.
complexity Sets the encoder complexity level.
hevc_enc_without_startcode Controls whether HEVC output is generated without start codes.
hevc_size_of_length_field Sets the HEVC length field size.

These controls can be passed to GStreamer through the extra-controls property when using the V4L2 encoder element.

Decoder controls

The VPU decoder also exposes V4L2 controls. These controls are mainly used to configure decode behavior, display delay, low-latency operation, timestamp handling, metadata output, and buffer management.

To see the available decoder controls, run:

v4l2-ctl -d /dev/video32 --list-ctrls
Control Description
decoder_slice_interface Enables or disables decoder slice interface mode.
display_delay Sets the display delay value.
display_delay_enable Enables display delay control.
lowlatency_mode Enables low-latency decode mode.
frame_rate Sets the decoder frame rate value.
operating_rate Sets the decoder operating rate.
ts_reorder Enables or disables timestamp reordering.
max_num_reorder_frames Reports or controls the maximum number of reordered frames.
coded_frames Reports coded-frame information.
thumbnail_mode Enables thumbnail decode mode.
priority Sets the decoder priority.
secure_mode Enables secure decode mode.
codec_config Controls codec configuration handling.
bitstream_size_overwrite Overrides the bitstream size.
meta_timestamp Enables timestamp metadata.
meta_picture_type Enables picture-type metadata.
meta_dec_qp_metadata Enables decoder QP metadata.
meta_concealed_mb_cnt Enables concealed macroblock count metadata.
meta_interlace Enables interlace metadata.
last_flag_event_enable Enables last-flag event signaling.

VPU use verification

These elements were tested both in the Ubuntu downloaded sources (Ubuntu 24.04.4) and the Yocto built image. use was verified within the debugging logs generated from GStreamer for multiple pipelines, for example:

GST_DEBUG="v4l2*:4,*h264*:4,*codec*:4" GST_DEBUG_FILE=/tmp/gst-vpu.log gst-launch-1.0 filesrc location=camera-720p30-h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v | grep -Ei "v4l2h264dec|v4l2.*decoder|/dev/video3[2-3]|VIDIOC_STREAMON" /tmp/gst-vpu.log

And the expected output look like this:

0:00:00.052985053  4026 0xaaaaead39a50 INFO                    v4l2 v4l2_calls.c:592:gst_v4l2_open:<v4l2h265dec0:sink> Opened device 'msm_vidc_decoder' (/dev/video32) successfully
0:00:00.053088127  4026 0xaaaaead39a50 INFO                    v4l2 v4l2_calls.c:688:gst_v4l2_dup:<v4l2h265dec0:src> Cloned device 'msm_vidc_decoder' (/dev/video32) successfully

All of these encoders and decoders expose the V4L2 I/O tuning properties capture-io-mode and output-io-mode. For the Dragonwing 9075 EVK, these properties can be configured as dmabuf to take advantage of hardware buffer sharing and reduce unnecessary memory copies between hardware-accelerated pipeline elements.

  • capture-io-mode: Configures the I/O mode used by the decoder capture queue. This corresponds to the decoder output side, or the buffers produced by the decoder on its src pad. It is configured as auto by default. It can be changed to:
  • auto: The default option when validating a pipeline or debugging negotiation issues.
  • rw: Uses standard read/write system calls. Use it only for basic debugging or compatibility testing.
  • mmap: Uses memory-mapped buffers allocated by the V4L2 driver. Useful when DMABUF negotiation is not available.
  • userptr: Uses user-allocated buffers passed to the V4L2 driver.
  • dmabuf: Uses DMA buffer file descriptors for buffer sharing. This mode is recommended for Dragonwing 9075 EVK hardware-accelerated pipelines when downstream elements support DMABUF.
  • dmabuf-import: Imports externally allocated DMA buffers into the V4L2 element. Use this mode when another element or subsystem owns the buffers and the decoder should import them.
  • output-io-mode: Configures the I/O mode used by the decoder output queue. This corresponds to the decoder input side, or the compressed buffers consumed by the decoder on its sink pad. It is configured as auto by default. It can be changed to:
  • auto: Lets GStreamer and the V4L2 driver choose the most appropriate I/O mode.
  • rw: Uses standard read/write access.
  • mmap: Uses memory-mapped buffers.
  • userptr: Uses application-provided memory.
  • dmabuf: Uses DMA buffer file descriptors.
  • dmabuf-import: Imports DMA buffers provided by an upstream element.

Testing VPU Decoders

The hardware decoders tested in this section include: v4l2h264dec, v4l2h265dec, v4l2vp9dec, and v4l2av1dec.

H.264 MP4 decode

gst-launch-1.0 filesrc location=input_h264.mp4 ! qtdemux ! h264parse ! v4l2h264dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
Expected result

plays H.264 video

H.264 MKV decode

gst-launch-1.0 filesrc location=input_h264.mkv ! matroskademux ! h264parse ! v4l2h264dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
Expected result

plays H.264 MKV video

H.265 MP4 decode

gst-launch-1.0 filesrc location=input_h265.mp4 ! qtdemux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
Expected result

plays H.265 video

H.265 MKV decode

gst-launch-1.0 filesrc location=input_h265.mkv ! matroskademux ! h265parse ! v4l2h265dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
Expected result

plays H.265 MKV video

VP9 decode

gst-launch-1.0 filesrc location=input_vp9.webm ! matroskademux ! vp9parse ! v4l2vp9dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
Expected result

plays VP9 video

AV1 decoder

gst-launch-1.0 filesrc location=input_av1.mp4 ! qtdemux ! av1parse ! v4l2av1dec capture-io-mode=dmabuf output-io-mode=dmabuf ! waylandsink -v
Expected result

plays AV1 video

Testing VPU Encoders

The hardware decoders tested in this section include: v4l2h264enc, v4l2h265enc

Create H.264 MP4

gst-launch-1.0 videotestsrc num-buffers=300 pattern=smpte ! video/x-raw,width=1920,height=1080,framerate=30/1 ! v4l2h264enc output-io-mode=dmabuf ! h264parse ! mp4mux ! filesink location=input_h264.mp4 -v
Expected result

creates H.264 MP4 video

Create H.264 MKV

gst-launch-1.0 videotestsrc num-buffers=300 pattern=snow ! video/x-raw,width=1280,height=720,framerate=30/1 ! v4l2h264enc output-io-mode=dmabuf ! h264parse ! matroskamux ! filesink location=input_h264.mkv -v
Expected result

creates H.264 MKV video

Create H.265 MP4

gst-launch-1.0 videotestsrc num-buffers=300 pattern=smpte ! video/x-raw,width=1920,height=1080,framerate=30/1 ! v4l2h265enc output-io-mode=dmabuf ! h265parse ! mp4mux ! filesink location=input_h265.mp4 -v
Expected result

creates H.265 MP4 video

Create H.265 MKV

gst-launch-1.0 videotestsrc num-buffers=300 pattern=snow ! video/x-raw,width=1280,height=720,framerate=30/1 ! v4l2h265enc output-io-mode=dmabuf ! h265parse ! matroskamux ! filesink location=input_h265.mkv -v
Expected result

creates H.265 MKV video

Performance

Glass-to-Glass Measurements

This section documents glass-to-glass latency measurements using different display sinks. In the case of kmssink, the display manager was stopped to fully own the display.

The camera used is a MIPI IMX577

Pipeline using waylandsink:

gst-launch-1.0 -e qtiqmmfsrc name=camsrc camera=0 ! 'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1,interlace-mode=progressive,colorimetry=bt601' ! waylandsink

Pipeline using kmssink

Before running the tests with kmssink, stop the graphical interface and switch to multi-user mode:

sudo systemctl isolate multi-user.target

Then run:

gst-launch-1.0 -e qtiqmmfsrc name=camsrc camera=0 ! 'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1,interlace-mode=progressive,colorimetry=bt601' ! kmssink

Results

Sink 4K latency 720p latency
kmssink 0.147339038 s 0.108750446 s
waylandsink 0.15126692 s 0.12861633 s

Cookies help us deliver our services. By using our services, you agree to our use of cookies.